Abstract:Q-learning algorithm is used to solve the optimal stabilization control problem while only the data, rather than the model of the plant, is available. Due to the continuity of state space and control space, Q-learning can only be implemented in an approximate manner. Therefore, the proposed approximate Q-learning algorithm can obtain only one suboptimal controller. Although the obtained controller is suboptimal, the simulation shows that the closed-loop domain of attraction of the proposed algorithm is broader and the cost function is also smaller than the linear quadratic regulator and deep deterministic policy gradient method for the strongly nonlinear plant.
陆超伦, 李永强, 冯远静. 基于强化学习的数据驱动最优镇定控制及仿真[J]. 模式识别与人工智能, 2019, 32(4): 345-352.
LU Chaolun, LI Yongqiang, FENG Yuanjing. Data Driven Optimal Stabilization Control and Simulation Based on Reinforcement Learning. , 2019, 32(4): 345-352.
[1] LEWIS F L, VRABIE D. Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control. IEEE Circuits and Systems Magazine, 2009, 9(3): 32-50. [2] LIU L, WANG Z S, ZHANG H G, et al . Neural-Network-Based Robust Optimal Tracking Control for MIMO Discrete-Time Systems with Unknown Uncertainty Using Adaptive Critic Design. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(4): 1239-1251. [3] KHAN S G, HERRMANN G, LEWIS F L, et al . Reinforcement Learning and Optimal Adaptive Control: An Overview and Implementation Examples. Annual Reviews in Control, 2012, 36(1): 42-59. [4] WATKINS C. Learning from Delayed Rewards. Ph.D. Dissertation. Cambridge, UK: King′s College of Cambridge, 1989. [5] MITCHELL T M. Machine Learning. New York, USA: McGraw-Hill Science Engineering, 1997. [6] LUO B, LIU D R, HUANG T W, et al . Model-Free Optimal Tracking Control via Critic-Only Q-learning. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(10): 2134-2144. [7] WEI Q L, LEWIS F L, SUN Q Y, et al . Discrete-Time Deterministic Q-learning: A Novel Convergence Analysis. IEEE Transactions on Cybernetics, 2017, 47(5): 1224-1237. [8] WEI Q L, LIU D R, SHI G. A Novel Dual Iterative Q-learning Method for Optimal Battery Management in Smart Residential Environments. IEEE Transactions on Industrial Electronics, 2015, 62(4): 2509-2518. [9] ZHAO D B, ZHU Y H. MEC-A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(2): 346-356. [10] LI Y Q, HOU Z S. Data-Driven Asymptotic Stabilization for Discrete-Time Nonlinear Systems. Systems and Control Letters, 2014, 64: 79-85. [11] SILVER D, LEVER G, HEESS N, et al . Deterministic Policy Gra- dient Algorithms[C/OL]. [2018-09-25]. http://proceedings.mlr.press/v32/silver14.pdf. [12] LILLIERAP T P, HUNT J J, PRITZEL A, et al . Continuous Control with Deep Reinforcement Learning[C/OL]. [2018-09-25]. https://arxiv.org/pdf/1509.02971.pdf. [13] XU J X, HOU Z S. Notes on Data-Driven System Approaches. Acta Automatica Sinica, 2009, 35(6): 668-675. [14] VAMVOUDAKIS K G. Non-zero Sum Nash Q-learning for Unknown Deterministic Continuous-Time Linear Systems. Automatica, 2015, 61: 274-281. [15] RASMUSSEN C E, NICKISCH H. The GPML Toolbox Version 3.6[DB/OL]. [2018-09-25].http://www.gaussianprocess.org/gpml/code/matlab/doc.